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Abstract. A binary tanglegram is a pair (S,T) of binary trees whose leaf sets are 
in one-to-one correspondence; matching leaves are connected by inter-tree edges. For 
applications, for example in phylogenetics, it is essential that both trees are drawn 
without edge crossing and that the inter-tree edges have as few crossings as possible. 
It is known that finding a drawing with the minimum number of crossings is NP-hard 
and that the problem is fixed-parameter tractable with respect to that number. 
We prove that under the Unique Games Conjecture there is no constant-factor approx- 
imation for general binary trees. We show that the problem is hard even if both trees 
are complete binary trees. For this case we give an 0(n 3 )-time 2-approximation and 
a new and simple fixed-parameter algorithm. We show that the maximization version 
of the dual problem for general binary trees can be reduced to a version of MaxCut 
for which the algorithm of Goemans and Williamson yields a 0.878-approximation. 



1 Introduction 

In this paper we are interested in drawing so-called tanglegrams [15], that is, pairs of trees 
whose leaf sets are in one-to-one correspondence. The need to visually compare pairs of 
trees arises in applications such as the analysis of software projects, phylogenetics, or clus- 
tering. In the first application, trees may represent package-class-method hierarchies or the 
decomposition of a project into layers, units, and modules. The aim is to analyze changes in 
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Fig. 1: A binary tanglegram showing two evolutionary trees for pocket gophers [8]. 

hierarchy over time or to compare human-made decompositions with automatically gener- 
ated ones. Whereas trees in software analysis can have nodes of arbitrary degree, trees from 
our second application, that is, (rooted) phylogenetic trees, are binary trees. This makes 
binary tanglegrams an interesting special case, see Fig. 1. Hierarchical clusterings, our third 
application, are usually visualized by a binary tree-like structure called dendrogram, where 
elements are represented by the leaves and each internal node of the tree represents the 
cluster containing the leaves in its subtree. Pairs of dendrograms stemming from different 
clustering processes of the same data can be compared visually using tanglegrams. 

In this paper we consider binary tanglegrams if not stated otherwise. From the applica- 
tion point of view it makes sense to insist that (a) the trees under consideration are drawn 
plane (namely, without edge crossing) , (b) each leaf of one tree is connected by an additional 
edge to the corresponding leaf in the other tree, and (c) the number of crossings among the 
additional edges is minimized. As in the bioinformatics literature (e.g., [15, 12]), we call this 
the tanglegram layout (TL) problem; Fernau ct al. [6] refer to it as two-tree crossing mini- 
mization. Note that we are interested in the minimum number of crossings for visualization 
purposes. The number is not intended to be a tree-distance measure. Examples for such 
measures are nearest-neighbor interchange and subtree transfer [2]. 

Related problems. In graph drawing the so-called two-sided crossing minimization prob- 
lem (2SCM) is an important problem that occurs when computing layered graph layouts. 
Such layouts have been introduced by Sugiyama et al. [17] and are widely used for drawing 
hierarchical graphs. In 2SCM, vertices of a bipartite graph are to be placed on two parallel 
lines (called layers) such that vertices on one line are incident only to vertices on the other 
line. As in TL the objective is to minimize the number of edge crossings provided that edges 
are drawn as straight-line segments. In one-sided crossing minimization (1SCM) the order 
of the vertices on one of the layers is fixed. Even 1SCM is NP-hard [5]. In contrast to TL, a 
vertex in an instance of 1SCM or 2SCM can have several incident edges and the linear order 
of the vertices in the non-fixed layer is not restricted by the internal structure of a tree. The 
following is known about 1SCM. The median heuristic of Eades and Wormald [5] yields a 
3-approximation and a randomized algorithm of Nagamochi [13] yields an expected 1.4664- 
approximation. Dujmovic et al. [3] gave an FPT algorithm that runs in (3*(1.4664 fe ) time, 
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where k is the minimum number of crossings in any 2-layer drawing of the given graph that 
respects the vertex order of the fixed layer. The 0*(-)-notation ignores polynomial factors. 

Previous work. Dwyer and Schreiber [4] studied drawing a series of tanglegrams in 2.5 
dimensions, i.e., the trees are drawn on a set of stacked two-dimensional planes. They con- 
sidered a one-sided version of TL by fixing the layout of the first tree in the stack, and then, 
layer- by-layer, computing the leaf order of the next tree in 0{n 2 log n) time each. Fernau et 
al. [6] showed that TL is NP-hard and gave a fixed-parameter algorithm that runs in 0*(c k ) 
time, where c is a constant that they estimate to be 1024 and k is the minimum number 
of crossings in any drawing of the given tanglegram. They showed that the problem can be 
solved in 0{n log 2 n) time if the leaf order of one tree is fixed. This improves the result of 
Dwyer and Schreiber [4]. They also made the simple observation that the edges of the tan- 
glegram can be directed from one root to the other. Thus the existence of a planar drawing 
can be verified using a linear-time upward-planarity test for single-source directed acyclic 
graphs [1]. Later, apparently not knowing these previous results, Lozano et al. [12] gave a 
quadratic-time algorithm for the same special case, to which they refer as planar tanglegram 
layout. Holtcn and van Wijk [9] presented a visualization tool for general tanglegrams that 
heuristically reduces crossings (using the barycenter method for 1SCM on a per-level base) 
and draws inter-tree edges in bundles (using Bezier curves). 

Our results. Let us call the restriction of TL to (complete) binary trees the (complete) 
binary TL problem. We first analyze the complexity of TL, see Section 2. We show that 
binary TL is essentially as hard as the MinUncut problem. If the (widely accepted) Unique 
Games Conjecture holds, it is NP-hard to approximate MinUncut [11] — and thus TL — 
within any constant factor. This motivates us to consider complete binary TL. It turns 
out that this special case has a rich structure. We start our investigation by giving a new 
reduction from Max2Sat that establishes the NP-hardness of complete binary TL. 

The main result of this paper is a simple recursive factor-2 approximation algorithm 
for complete binary TL, see Section 3. It runs in 0(n 3 ) time and extends to d-ary trees. 
Our algorithm can also process general binary tanglegrams without guaranteeing any ap- 
proximation ratio. It works very well in practice and is quite fast when combined with 
branch-and-bound. For an experimental evaluation, see the companion paper [14]. 

Next we consider a dual problem: maximize the number of edge pairs that do not cross. 
We show that this problem (for general binary trees) can be reduced to a version of MaxCut 
for which the algorithm of Goemans and Williamson yields a 0.878-approximation. 

Finally, we investigate the parameterized complexity of complete binary TL. Our param- 
eter is the number k of crossings in an optimal drawing. We give a new FPT algorithm for 
complete binary TL that is both much simpler and much faster than the FPT algorithm for 
general binary TL by Fernau et al. [6]. The running time of our algorithm is 0(4 fc n 2 ), see 
Section 4. The analysis of our algorithm is interesting since the parameter does not drop in 
each level of the recursion. 

Formalization. We denote the set of leaves of a tree T by L(T). We are given two rooted 
trees S and T with n leaves each. We require that S and T are uniquely leaf-labeled, that 
is, there are bijective labeling functions A s : L(S) — > A and \ T : L(T) — > A, where A 
is a set of labels, for example, A = {1, . . . ,n}. These labclings define a set of new edges 
{uv | u £ L(S), v £ L(T), Xs{u) = Xt(v)}, the inter-tree edges. The TL problem consists 
of finding plane drawings of S and T that minimize the number of induced crossings of the 
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inter-tree edges, assuming that edges are drawn as straight- line segments. We additionally 
insist that the leaves in L(S) are placed on the vertical line x — and those in L(T) on the 
line x = 1. The trees S and T themselves are drawn to the left of x — and to the right 
of x = 1, respectively. For an example of an orthogonal drawing, see Fig. 1. Given uniquely 
leaf labeled trees S and T, we denote the resulting instance of TL by (S, T) . 

The TL problem is purely combinatorial: Given a tree T, we say that a linear order of 
L(T) is compatible with T if for each node v of T the nodes in the subtree of v form an 
interval in the order. Given a permutation ir of {1, . . . , n}, we call an inversion in tt if 

1 < j and > 7r(j). For fixed orders a of 1/(5*) and r of L(T) we define the permutation 
7r T , CT , which for a given position in r returns the position in a of the leaf having the same 
label. Now the TL problem consists of finding an order a of L(S) compatible with S and an 
order r of L(T) compatible with T such that the number of inversions in 7r T-cr is minimum. 

2 Complexity 

In this section we consider the complexity of binary TL, which Fernau et al. [6] have shown to 
be NP-complcte. We strengthen their findings in two ways. First, we show that it is unlikely 
that an efficient constant-factor approximation for general binary TL exists. Second, we 
show that TL remains hard even when restricted to complete binary tanglcgrams. 

We start by showing that binary TL is essentially as hard as the MinUncut problem. 
This relates the existence of a constant-factor approximation for TL to the Unique Games 
Conjecture (UGC) by Khot [10]. The UGC became famous when it was discovered that 
it implies optimal hardness-of-approximation results for problems such as MaxCut and 
VertexCover, and forbids constant factor-approximation algorithms for problems such as 
MinUncut and SparsestCut. We reduce the MinUncut problem to the TL problem, 
which, by the result of Khot and Vishnoi [11], makes it unlikely that an efficient constant- 
factor approximation for TL exists. 

The MinUncut problem is defined as follows. Given an undirected graph G = (V,E), 
find a partition (Ui,U 2 ) of the vertex set V that minimizes the number of edges that are 
not cut by the partition, that is, min^ y 2 ) \{uv € E : u, v € Vi or u, v € V2H. Note that 
computing an optimal solution to MinUncut is equivalent to computing an optimal solution 
to MaxCut. Nevertheless, the MinUncut problem is more difficult to approximate. 

Theorem 1. Under the Unique Games Conjecture it is NP-hard to approximate the TL 
problem for general binary trees within any constant factor. 

Proof. As mentioned above we reduce from the MinUncut problem. Our reduction is similar 
to the one in the NP-hardness proof by Fernau et al. [6] . 

Consider an instance G = (V, E) of the MinUncut problem. We will construct a TL 
instance (S 7 T) as follows. The two trees S and T are identical and there are three groups of 
edges connecting leaves of S to leaves of T. For simplicity we define multiple edges between 
a pair of leaves. In the actual trees we can replace each such leaf by a binary tree with the 
appropriate number of leaves. 

Suppose V — {i>i, v 2 , . . . , v n }, then both S and T are constructed as follows. There is a 
backbone path (v\, v?, v\, v\, . . . , v\, a) from the root node v\ to a leaf a. Additionally, 
there are leaves Isirf) and It{v\) attached to each node v\ for i £ {1, . . . , n} and j G {1,2} 
in S and T, respectively. The edges form the following three groups. 
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Group A contains n 11 edges connecting Is (a) with It{o)- 

Group B contains for every i>j <G V n 7 edges connecting ls(vj) with l T {vf), and n 7 edges 

connecting ls(vf) with l T {v}). 
Group C contains for every ViVj € E a single edge from ls{v}) to Zt 

Next we show how to transform an optimal solution of the MinUncut instance into a 
solution of the corresponding TL instance. Suppose that in the optimal partition (Vj*, V 2 *) 
of G there are fe edges that are not cut. Then we claim that there exists a drawing of (S,T) 
such that k ■ n 11 + 0(n 10 ) pairs of edges cross. It suffices to draw, for each vertex vi e V* 
(vi € V 2 *), the leaves ls(v}) and lr(vf) above (below) the backbones, and the nodes ls( v f) 
and It{v}) below (above) the backbones. It remains to count: there are k-n 11 A-C crossings, 
no A-B crossings, 0(n 10 ) B-C crossings, and 0(n 4 ) C-C crossings. 

Now suppose there exists an a-approximation algorithm for the TL problem with some 
constant a. Applying this algorithm to the instance (S, T) defined above yields a drawing 
D(S,T) with at most a ■ k ■ n 11 + 0(n 10 ) crossings. Let us assume that n is much larger 
than a. We show that from such a drawing D(S,T) we would be able to reconstruct a cut 
(Vi, V2) in G with at most a ■ k edges uncut. First, observe that if a node ls(vj) is drawn 
above (below) the backbone in D(S, T), then Irivf) must be drawn on the same side of the 
backbone, otherwise it would result in n 18 A-B crossings. Similarly ls(vf) must be on the 
same side as It(v}). Then observe that if a node ls(vj) is drawn above (below) the backbone 
in D(S,T), then ls(vf) must be drawn below (above) the backbone, otherwise there would 
be 0(n 14 ) B-B crossings. Finally, observe that if we interpret the set of vertices i>j for which 
ls(vj) is drawn above the backbone as a set V\ of a partition of G, then this partition leaves 
at most a ■ k edges from E uncut. 

Hence, an a-approximation for the TL problem provides an a- approximation for the 
MinUncut problem, which contradicts the UGC. □ 

The above negative result for (general) binary TL is our motivation to investigate the 
complexity of complete binary TL. It turns out that even this special case is hard. It seems 
difficult to modify the NP-hardness proof of Fernau et al. [6] for binary TL since it uses 
extremely unbalanced trees. 

Our proof (see the appendix) is by reduction from a variant of Max2Sat (where each 
variable occurs at most three times). Our proof is quite different from that of Fernau et 
al., who reduce from MaxCut. We construct a TL instance in which one pair of aligned 
subtrees contains the variable gadgets. The two pairs of aligned subtrees to both sides of 
the variable gadgets contain the clause gadgets. The fourth pair of aligned subtrees on the 
same level has no crossings. Each clause gadget is modeled by a pair of smaller subtrees, 
see Fig. 12. These are connected by inter-tree edges to the gadgets of the two corresponding 
variables. These edges cause exactly one additional crossing for each unsatisfied clause in 
an optimal solution. Thus we can infer the maximum number of satisfied clauses from an 
optimal TL solution. 

Theorem 2. The TL problem is NP-hard even for complete binary tanglegrams. 
3 Approximation 

We now present our main result, a 2- approximation algorithm for complete binary TL that 
runs in 0(n 3 ) time. The idea is to split a given tanglegram recursively at the roots of the 
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Fig. 2: Context of subinstance Fig. 3: Labels for a particular subinstance (S,T). The numbers 
(S,T) = ((Si, S^), (Ti,T2)). at the nodes show the choice taken (swap/do not swap children) 

at that step of the recursion that led to S and T. 



two trees into two subinstances, each again consisting of a pair of complete binary trees. Let 
(S, T) be a subinstance of (So, T ) with subtrees S C So and T C T rooted at nodes t>s G So 
and Vt € To, respectively. When treating (S, T), we use the following pieces of information. 

Firstly, associated with vs and vt we have labels £s and It that indicate what choices 
in the recursion so far led to the current subinstances. A label is a bit string whose bits 
represent the two choices at each node in the path from the root of the original tree to the 
current node. The length of the labels, which we denote by \£g\ and \£t\, encodes the depth 
of the recursion (see Fig. 3). 

We also assign labels to some other subtrees of (So, To) apart from S and T. Given a 
leaf v € To \ T, we define the largest X '-avoiding tree of v to be the largest complete binary 
subtree of To that contains v, but not T. Largest S-avoiding trees are defined analogously 
for leaves in So- Each largest S- or T-avoiding tree receives a label in the same way as S 
and T. For a given subinstance (S,T), there are 2(\£s\ + 1) = 2{\£t\ + 1) different labels. 
Note that the labels of the avoiding trees are relative to the labels of v$ and vt, that is, 
a different subinstance leads to different labels. If we refer (in the context of a subinstance 
(S, T)) to the label of a leaf v e T , we mean the label of the largest T-avoiding tree of v. 

Secondly, since S and T are part of a larger tree, some of the leaves of S may not have the 
matching leaf in T (and vice versa). This means that at some previous step of the algorithm, 
it was decided that such leaves will be matched to leaves in some other subtrees, above or 
below (S,T). We will not know exactly to which leaves they are matched, but we will know, 
for each leaf, the label of the subtree that contains the matching leaf. 

At each level of the recursion we have to choose between one out of four configurations. 
At each node vs on the left side, we must choose between having Si above S2 or the other 
way around. On the right side for vt, there are also two different ways of placing Ti and 
T2. For each of the four configurations we invoke the algorithm twice recursively: for the 
top half and for the bottom half. We return the configuration with the smallest number of 
crossings. 

When counting the crossings that a configuration creates, we distinguish two types: 
current-level and lower-level crossings. 
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(a) 

Fig. 4: Different types of current-level crossings. For type (d), the crossing is considered current-level 
only if the right leaves of the edges that cross have different labels, that is, if l T i 7^ t T n. 



Current-level crossings are crossings that can be avoided at this level by choosing one of 
the four configurations for the subtrees, independently of the choices to be done elsewhere in 
the recursion. Figure 4 illustrates the four different types (a)-(d) of current-level crossings. 
For type (d), we remark that crossings are considered to be current-level only if the largest 
S- and T-avoiding trees that contain the endpoints of the edges outside S and T are different. 
Crossings of type (d) where both endpoints belong to the same largest T-avoiding tree cannot 
be counted at this point. We call them indeterminate crossings. 

Lower-level crossings are crossings that appear based on choices taken by solving the 
subinstances of S and T recursively. We cannot do anything about them at this level, but 
we know their exact number after solving the subinstances. 

Here is a sketch of the algorithm. 



1. For all four choices of arranging {Si, S2} and {Ti, T 2 }, compute the total number of 
lower-level crossings recursively. Before each recursive call (Si,Tj), we assign proper 
labels to some of the leaves of S and T, as follows. All leaves in Si that connect to T%-j 
(that is, Ti if j = 2, T2 otherwise) get the label It with a or 1 appended depending 
on whether Tj is above or below T^j. Then we do the analogue for all leaves of re- 
connected to Ss-{. 

2. For each choice (Si,Tj) compute the number of current-level crossings (details below). 

3. Return the choice that has the smallest sum of lower-level and current-level crossings. 

It is important to notice that the labels are needed to propagate as much information as 
possible to the smaller subinstances. For example, even though at this stage of the recursion 
it is clear that the leaves of, say T$-j, are above the leaves of the subtrees below T, once 
we recurse into the top subinstance, this information will be lost, implying that what was a 
current-level crossing at this stage, will become an indeterminate crossing later. The labeling 
allows to prevent this loss of information. 

The number of current-level crossings can be computed in linear time as follows. We go 
through all inter-tree edges incident to leaves of each of the four subtrees and put each edge 
into one of at most O(logn) different classes depending on the labels of the other endpoints 
of the edges. This is done in linear time. Depending on where the largest S- or T-avoiding 
trees go (that is, above or below) , all edge pairs belonging to a specific pair of labels do or do 
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classified according to the ar- 
eas of its endpoints. Fig. 6: There are 14 groups of edges w.r.t. ({Si, Si), (Ti,T2)}. 

not intersect. Hence we can count the total number of current-level crossings by multiplying 
the cardinalities of those 0(log 2 n) pairs of classes whose edges all intersect each other. 

The running time of the algorithm satisfies the recurrence T(ri) < 8T(n/2) + 0(n) , which 
solves to T(n) = 0(n 3 ). 

Theorem 3. Given an complete binary tanglegram (So, To) with n inter-tree edges, the re- 
cursive algorithm computes in 0(n 3 ) time a drawing of (So, To) that has at most twice as 
many crossings as an optimal drawing. 

Proof. Fix an optimal drawing S of (So, To). The algorithm tries, for a given subinstancc 
(S,T) of (S , T ), all four possible layouts of S = (S 1 ,S 2 ) and T = (T U T 2 ). Assume that 
in S, (S, T) is drawn as ((Si, S2), (T\, T2)). We distinguish between four different areas for the 
endpoints of the edges: above (S,T), in (Si,Tx), in (S2,T 2 ), and below (S, T). We number 
these regions from to 3 (see Fig. 5). This allows us to classify the edges into 16 groups (two 
of which, 0-0 and 3-3, are not relevant). We denote the number of i-j edges, that is, edges 
from area i to area j, by Uij (for i, j € {0, 1, 2, 3}). Figure 6 shows the 14 relevant groups of 
edges. 

The only edge crossings that our recursive algorithm cannot take into account are the 
indeterminate crossings, which occur when the two edges connect to leaves above or below 
(S, T) that are in the same largest S- or T-avoiding tree. This is the case if both leaves have 
the same label. Such crossings cannot be predicted from the current subinstance because they 
depend on the relative position of the other two endpoints of the edges. We can, however, 
bound the number of these crossings. 

We observe that any crossing of that type at the current subinstance was, in some previous 
step of the recursion, a crossing between two 1-2 edges or two 2-1 edges. We can upper- 
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Fig. 7: Example of a tanglegram for which the approximation algorithm may output a drawing 
(left) that has roughly twice as many crossings as the optimal drawing (right). 

bound the number of these crossings by ("j 2 ) + ("I 1 ) - Let Cal s ^ e ^ ne num ber of crossings in 
the solution produced by the algorithm, and let c op t be the number of crossings of <5. Then 

Caig < c op t + f^ 2 ) + ( 7 2*) - Copt + ^ n i 2 + "2i)/ 2 - C 1 ) 

Since our (sub)trees are complete, we have nio + ^12 + "-13 = «oi + n 2i + "31 and rtoi + 
n Q2 + "-03 = "-in + n 20 + n 30- These two equalities yield 7112 < n i — n 10 + n 2 i + n 31 and 
^01 — H10 < n 20 + t^o, respectively, and thus we obtain ri\i < 7^20 + ^30 + ^21 + "-31 or, 
equivalently, n\ 2 < n\i ■ (n 2 Q + n 30 + n 2 i + n 31 ). 

It is easy to verify that all the terms on the right-hand side of the last inequality count 
crossings that cannot be avoided and must be present in the optimal solution as well. Hence 
^12 ^ c opt, and symmetrically < c opt . Plugging this into (1) yields c a i g < 2 • c opt . □ 

For our algorithm, the approximation factor of 2 is tight: let n = Am, let S have leaves 
ordered 1, . . . , 4m, and let T have leaves ordered 1, . . . , m, 3m, . . . , 2m+l, m+1, . . . , 2m, 3m+ 
l,...,4m, see Fig. 7). Then our algorithm may construct a drawing with m 2 + 2(™) = 
m(2m — 1) crossings, while the optimal drawing has only m 2 crossings. 

General binary trees. Obviously, our recursive algorithm can also be applied to general, 
non-complete tanglegrams. In this case, however, the approximation factor does not hold 
any more, which is also indicated by Theorem 1. The companion paper [14] contains an 
extensive experimental evaluation of several heuristic algorithms for TL in which our recur- 
sive algorithm turned out to be a successful method for both complete and general binary 
tanglegrams. 

Generalization to d-ary trees. The recursive algorithm can be generalized to complete 
d-ary trees. The recurrence relation of the algorithm's running time changes to T{n) < 
d ■ (dl) 2 ■ T(n/d) + 0(n) since we need to consider all dl subtree orderings of both trees, each 
of which triggers d subinstances of size n/d. Again, by the master method, this resolves to 
T(n) — O(n 1+21og d( d! )). At the same time the approximation factor increases to 1 + (^f). 

Maximization version. Instead of the original TL problem, which minimizes the number 
of pairs of edges that cross each other, we may consider the dual problem TL* of maximizing 
the number of pairs of edges that do not cross. The tasks of finding optimal solutions for 
these problems are equivalent, but from the perspective of approximation it makes quite 
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a difference which of the two problems we consider. Now we do not assume that we draw 
binary trees. Instead, if an internal node has more than two children, we assume that we 
may only choose between a given permutation of the children and the reverse permutation 
obtained by flipping the whole block of children. 

In contrast to the TL problem, which is hard to approximate as we have shown in 
Theorem 1, the TL* problem has a constant-factor approximation algorithm. We show this 
(see the appendix) by reducing TL* to a constrained version of the MaxCut problem, 
which can be approximately solved with a semidefinite programming rounding algorithm by 
Goemans and Williamson [7]. 

Theorem 4. There exists a 0. 878- approximation algorithm for the TL* problem. 
4 Fixed-Parameter Tractability 

We consider the following parameterized problem. Given a complete binary TL instance 
(S,T) and a non-negative integer k, decide whether there exists a TL of S and T with at 
most k induced crossings. Our algorithm for this problem uses a labeling strategy, just as 
our approximation algorithm in Section 3. However, here we do not select the subinstance 
that gives the minimum number of lower-level crossings, but we consider all subinstances 
and rccursc on them. Thus, our algorithm traverses a search tree of branching factor 4. 
For the search tree to have bounded height, we need to ensure that whenever we go to a 
subinstance, the parameter value decreases at least by one. For efficient bookkeeping we 
consider current-level crossings only. At first sight this seems problematic: if a subinstance 
does not incur any current-level crossings, the parameter will not drop. The following key 
lemma shows that there is a way out. It says that if there is a subinstance without current- 
level crossings, then we can ignore the other three subinstances and do not have to branch. 
This could be seen as a preprocess at each branching occasion, and is also exploited in some 
existing fixed-parameter algorithms. The lemma does not hold for general binary trees. 

Lemma 1. Given a pair (S,T) of two complete binary trees as an instance of the TL prob- 
lem and two nodes vs,vt of S,T, respectively, with the same distance to their respective 
root. Let (Si,S 2 ) be the subtrees incident to vs and (Ti,T 2 ) the subtrees incident to Vt- 
If the subinstance ((Si, S 2 ), (Ti, T 2 )) does not incur any current-level crossings, then any 
ordering of the leaves of this subinstance does not have more crossings than the same order- 
ing of the leaves of one of the other subinstances ((Si, S2), (T 2 , Ti)) , ((S2, Si), (Ti, T2)), or 
((S 2 ,Si),(T 2 ,Ti)). 

Proof. If the subinstance ((Si, S2), (Ti, T 2 )) does not incur any current-level crossings, the 
edges originating from these four subtrees are edges of the types shown in Fig. 8a (or the 
symmetric case with no edges between 5*2 and Ti). Let «n, n 2 i, n 22 , h, h, ri, ^2 be the num- 
bers of edges as in Fig. 8. Since we consider complete binary trees we obtain the following 
equalities: h — r 1 + n 2 i, r 2 = h + ^21, and n + nu = l 2 + n 22 . 

Take any fixed ordering of the leaves of the subtrees Si , S2 , Ti , T 2 . We first compare 
the number of crossings of the subinstance ((Si, S2), (Ti, T 2 )) with the number of crossings 
of the subinstance ((S 2 , Si), (T 2 , Ti)) in Fig. 8b. The subinstance ((Si,S 2 ),(Ti,T 2 )) can 
have at most n 2 i(nu + n 22 ) crossings that do not occur in ((S 2 , Si), (T 2 , Ti)). However, 
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(a) ((S u S 2 ),(Ti,T 2 )} 



(b) ((S 2 , Si), (T 2 ,Ti)) 



(c) ((5i,5 2 ),(T 2) T 1 )) 



Fig. 8: Edge types and crossings of the instance (S,T). 

((S 2 ,Si),(T 2 ,Ti)) has at least Zi(/ 2 + n 2 i + ?i 22 ) + / 2 nii+r 2 (ri+n 2 i + 7iii) + rin 22 crossings 
that do not appear in ((Si,^), (Ti,T2)). Inserting the above equalities for li and r 2 we get 
(ri + n 2 i)(^2 + «2i + "22) + hnu + (Z 2 + ?T-2i)(^i + n 2 i + nn) +rin 2 2 > «2i(«ii +"22)- Thus, 
the same ordering of leaves does not give more crossings for ((Si, S 2 ), (Ti, T 2 )) than it does 
for (T a ,Ti)>. 

Next, we compare the number of crossings of the subinstance ((Si, S2), (Ti, T 2 )) with 
the number of crossings of the subinstance ((Si, S 2 ), (T 2 , T x )) in Fig. 8c. Now the num- 
ber of additional crossings of ((Si, S 2 ), (Ti, T 2 )) is at most n 2 in 22 , and the subinstance 
((Si, S 2 ), (T 2 , Ti)) has at least (ri + nn)(r 2 + n 22 ) + r 2 n 2 i crossings more. With the equality 
ri + «n = h + n 2 2 and the inequality r 2 + n 22 > n 2 i we get (ri + nn)(r 2 + n 22 ) + r 2 n 2i > 
n 22 n 2 i. Thus, again ((Si, S 2 ), (Ti, T 2 )) does not have more crossings than ((Si, S 2 ), (T 2 , Ti)) 
for the same leaf ordering. By symmetry, the same holds for ((S 2 ,Si), (Ti,T 2 )). □ 

Thus, to decompose the instance to four subinstances we spend 0(n 2 ) time. Therefore 
we spend 0(4 fe n 2 ) time to produce all leaves of our bounded-height search tree (omitting 
details). At each leaf of the search tree, we obtain a certain layout of (S, T), and the accu- 
mulated number of current-level crossings is at most k. This, however, does not mean that 
the total number of crossings is at most k since we did not keep track of the indeterminate 
crossings. Therefore, at each leaf we still need to check how many crossings the correspond- 
ing layout has. This can be done in 0(n log n) time. If one of the leaves yields at most k 
crossings, the algorithm outputs "Yes" and the layout; otherwise it outputs "No." 

We summarize our discussion as follows. 

Theorem 5. The algorithm sketched above solves the parameterized version of complete 
binary TL in 0(4 k n 2 ) time. 

5 Open Problems 

We have shown that one cannot expect to find a constant-factor approximation for general 
binary TL. Would it help if one of the two given trees was complete? 

We have given a factor-2 approximation for complete binary TL. It is natural to ask 
whether we can do better. 
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Appendix 



Theorem 2. The TL problem is NP-hard even for complete binary tanglegrams. 

Proof. Recall the Max2Sat problem which is defined as follows. Given a set U = {x±, . .. ,x n } 
of Boolean variables, a set C = {ci, . . . , c m } of disjunctive clauses containing two literals 
each, and an integer K, the question is whether there is a truth assignment of the variables 
such that at least K clauses are satisfied. We consider a restricted version of Max2Sat, 
where each variable appears in at most three clauses. This version remains NP-complete [16]. 

Our reduction constructs two complete binary trees S and T, in which certain aligned 
subtrees serve as variable gadgets and others as clause gadgets. We further determine an 
integer K' such that the instance (S, T) has less than K' crossings if and only if the corre- 
sponding Max2Sat instance has a truth assignment that satisfies at least K clauses. 

The high-level structure of the two trees is depicted in Fig. 9. From top to bottom, 
the four subtrees at level 2 on both sides are a clause subtree, a variable subtree, another 
clause subtree, and finally a dummy subtree. The subtrees are connected to each other by 
edges such that in any optimal solution they must be aligned in the depicted (or mirrored) 
order. Each clause gadget appears twice, once in each clause subtree, and is connected to 
the variable gadgets belonging to its two literals. Pairs of corresponding gadgets in S and 
T are connected to each other. Finally, non-crossing dummy edges connect unused leaves to 
complete S and T. In the following we describe the gadgets in more detail. 




Fig. 9: High-level structure of the two trees S and T. Red edges connect clause and variable gadgets, 
green edges connect corresponding gadget halves, and gray edges are dummy edges to complete the 
trees. 
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Variable gadgets. The basic structure of a variable gadget consists of two complete binary 
trees with 32 leaves each as shown in Fig. 10. Each tree has three highlighted subtrees of 
size 2 labeled a, b, c and a! ,b' ,d , respectively. From each of these subtrees there is one red 
connector edge leaving the gadget at the top and one leaving it at the bottom. As long as 
two connector edges from the same tree do not cross each other, they transfer the vertical 
order of the labeled subtrees towards a clause gadget. We define the configuration in Fig. 10a 
as true and the configuration in Fig. 10b as false. If the configuration is in its true state, the 
induced vertical order of the connector edges is a < b < c, otherwise the order is inverse: 
c < b < a. It can easily be verified that both states have the same number of crossings. 
To see that it is optimal observe that each pair of connector edges from the same subtree 
(for example, subtree a) always crosses all 26 gray edges in the gadget. Furthermore all 
24 crossings of two connector edges in the figure are mandatory. Finally, the four crossings 
among the gray edges between subtrees 1 and 2' and subtrees 2 and 1' are also optimal. 
(Otherwise, if subtree 1 is opposite of subtree 2', there are at least 120 gray-gray crossings 
in addition to the 24 red-red crossings and the 156 red-gray crossings as opposed to a total 
of 184 crossings in cither configuration of Fig. 10.) 

Note that so far the gadget in the figure is designed for a single appearance of the 
variable since the four connector-edge triplets are required for a single clause. However, for 
the Max2Sat reduction each variable can appear up to three times in different clauses. By 
appending a complete binary tree with four leaves as in Fig. 11 to each leaf of the gadget in 
Fig. 10 and copying each edge accordingly the above arguments still hold for the enlarged 
trees with 128 leaves each. Unused connector edges in opposite subtrees are linked to each 
other (a to a' etc.) as in Fig. 10b such that the number of crossings in the gadget remains 
balanced for both states. 

Clause gadgets. For each clause c, = In V la, where In and la denote the two literals, 
we create two clause gadgets: one in the upper clause subtrees and one in the lower clause 
subtrees (recall Fig. 9). Each gadget itself consists of two parts: one part that uses the 
connectors from the first variable in the left tree and those from the second variable in the 
right tree and vice versa. Fig. 12 shows one such part of the gadget in the lower clause 
subtrees, where the connector edges lead upwards. The gadget in the upper clause subtree 
is simply a mirrored version. 

The basic structure consists of two aligned subtrees with eight leaves as depicted in 
Fig. 12. Three of the leaves on each side serve as the missing endpoints for the triplets of 
connector edges from the corresponding variables. Recall that for a positive literal with value 
true the order of the connector edges is a < b < c, and for a positive literal with value false it 
is c < b < a. (For negative literals the meaning of the orders is inverted.) The two connector 
leaves for the edges labeled a and b are in the same subtree with four leaves, the connector 
leaf for c is in the other subtree. Three cases need to be distinguished. If (1) both literals are 
true, then the configuration in Fig. 12a is optimal with 21 crossings. If (2) only one literal 
is true, then Fig. 12b shows an optimal configuration with 21 crossings again. Here the tree 
on the right side is rotated in its root node. Finally, if (3) both literals are false, there are 
at least 22 crossings in the gadget as shown in Fig. 12c. Since this substructure is repeated 
four times for each clause we have 84 induced crossings for satisfied clauses and 88 induced 
crossings for unsatisfied clauses. 
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(a) A single gray edge. 



< > 




(a) x = true 



(b) x = false 



(b) Two pairs of connector 
edges for a variable used in 
three clauses. 



Fig. 10: The variable gadget in its two optimal configura- 
tions with 184 crossings. Red edges are drawn solid, whereas 
dash-dot style is used for gray edges. 



Fig. 11: Replacing each edge by 
four edges. 



We construct the gadgets for all variables and clauses and link them together as two 
trees S and T, which are filled up such that they become complete binary trees. The general 
layout is as depicted in Fig. 9, where each dummy leaf in S is connected to the opposite 
dummy leaf in T such that there are no crossings among dummy edges. In each of the four 
main subtrees all dummy edges are consecutive. Thus of all dummy edges only those in the 
variable subtree have crossings with exactly half the connector edges. 

It remains to compute the minimum number M of crossings that are always necessary, 
even if all clauses are satisfied. Then the Max2Sat instance has a solution with at least 
K satisfied clauses if and only if the constructed TL instance has a solution with at most 
K' = M + 4(|C| — K) crossings. We get the corresponding variable assignment directly from 
the layout of the variable gadgets. 
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The first step for computing M is to fix an order for the variable gadgets in the variable 
subtree. Let this order be X\ < x 2 < ■ . ■ < x n . To enforce this as the vertical order of the 
variable gadgets we need to establish links between adjacent gadgets such that any other 
order would increase the number of crossings. For these neighbor links we need eight of the 
128 leaves in each half of each variable gadget as shown in Fig. 13. Since both subtrees below 
the root of Xi in S and both subtrees below the root of Xj+i in T are connected to each 
other, the minimum number of crossings of those edges is independent of the truth state of 
each gadget. However, separating two adjacent variables by tree rotations at higher levels 
in S and T leads to a large number of extra crossings since the eight neighbor links would 
cross all variable gadgets between Xi and Xj+i. 
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With the order of the variables fixed we sort all clauses lexicographically and place smaller 
clauses towards the top of the clause subtrees. Consider two clause gadgets in the same clause 
subtree. Then in the given clause order there are crossings between their connector-edge 
triplets if and only if the intervals between their respective variables intersect in the variable 
order. Since these crossings are unavoidable, the number of connector-triplet crossings in 
the lexicographic order of the clauses is optimal. Now we can finally compute all necessary 
crossings between connector edges, dummy edges and intra-gadget edges which yields the 
number M. 

Since each gadget is of constant size the two trees and the number M can be computed 
in polynomial time. 

The fact that the complete binary TL problem belongs to the class MV follows immedi- 
ately from the NP-completeness of the general TL problem [6] . □ 

Theorem 4. There exists a 0. 878- approximation algorithm for the TL* problem. 

Proof. Fix any drawing of the two trees S and T in an instance of the TL* problem. Any 
internal node of each of the trees corresponds to a decision variable. The decision to make 
in each such node is whether to flip the subtree rooted in that node or not. We model 
this situation by a graph; a flip decision corresponds to deciding to which side of a cut the 
corresponding vertex is assigned. 

For each internal node v of a tree in the instance of TL* the constructed graph G contains 
two vertices v and v' . For each pair of edges connecting leaves of the two trees, there is one 
edge in G. Let l\ and l 2 (r\ and r 2 ) denote the leaves of S (T) incident to this pair of 
edges. Let / be the lowest common ancestor of l\ and l 2 in S (I — LCA(7i,Z 2 )) and let 
r = LCA(ri,r 2 ) in T. If the considered pair of edges crosses in the initial drawing, then we 
have an edge {I, r} in G. If the pair of edges does not cross in the initial drawing, then there 
is an edge {I, r'} in G. 

It remains to observe that cuts in G that separate each pair v, v' correspond to drawings 
of S and T in the instance of the TL* problem. Moreover, edges that are cut in G correspond 
to the pairs of edges that do not cross in the drawing of the two trees. 

The resulting optimization problem is the MaxResCut problem (that is, the MaxCut 
problem with additional constraints forcing certain pairs of vertices to be separated by the 
cut) studied by Gocmans and Williamson [7]. Therefore, we may use their semidefinite pro- 
gramming rounding algorithm to compute a 0.878-approximation of the largest constrained 
cut in the graph G. This cut determines which of the subtrees in the initial drawing must 
be flipped to obtain a drawing that is a 0.878-approximation to TL*. □ 
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